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Drought forecasting is important in preparing for drought and its mitigation 
plan. This study focuses on the investigating the performance of Auto 
Regressive Integrated Moving Average (ARIMA) and Empirical Wavelet 
Transform (EWT)-ARIMA based on clustering analysis in forecasting 
drought using Standard Precipitation Index (SPI). Daily rainfall data from 
Arau, Perlis from 1956 to 2008 was used in this study. SPI data of 3, 6, 9, 12 
and 24 months were then calculated using the rainfall data. EWT is employed 
to decompose the time series into several finite modes. The EWT is used to 
create Intrinsic Mode Functions (IMF) which are used to create ARIMA 
models. Fuzzy c-means clustering is used on the instantaneous frequency 
given by Hilbert Transform of the IMF to create several clusters. The 
objective of this study is to compare the effectiveness of the methods in 
accurately forecasting drought in Arau, Malaysia. It was found that the 
proposed model performed better compared to ARIMA and EWT-ARIMA. 
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1. INTRODUCTION 

Drought is not unfamiliar and happens across the world. Drought can be destructive to lives of 
people in an area. Drought is a natural hazard and can be defined as a deficiency in precipitation over an 
extended period, which is usually a season or more that causes water shortage. Droughts occur in all climatic 
zone, which are given by the drop of precipitation over a long period of time over a season or a year [1]. 

Drought forecasting can be done either using physical model or data driven models. Physical models 
are more complex than data driven models since they require more data as its input, while, data driven 
models require less information and have rapid development times [2]. They are also found to be accurate 
enough, thus they are widely used in hydrological forecasting. A number of studies used autoregressive 
integrated moving average (ARIMA), a stochastic model, to forecast drought [3], such as in [4], [5]. 
Mossad and Alazba [5] used ARIMA to forecast drought in arid region in Kingdom of Saudi Arabia. 
Mishra et. al. [4] used a hybrid stochastic and neural network model to forecast drought in Kansabati region 
in India. Generally, ARIMA was found to be suitable to be used for drought forecasting using standard 
precipitation index (SPI) time series. In other field, ARIMA was shown to be able to effectively forecast 
Malaysian gold bullion price [6]. Stochastic methods such as ARIMA are good in forecasting linear time 
series, however they does not perform as well when forecasting non-linear time series [7]. 

Wavelet transform are used to improve forecast accuracy, where they are applied along with 
artificial intelligence models and stochastic models. Wavelet transform is used to decompose the time series 
into multiple sub series of smooth and detailed component [8]. Wavelets have been shown to increases the 
forecast accuracy of ARIMA [9], [10] and artificial neural network models [11]. In [10], Kriechbaumer et al. 
found out that the use of Wavelet transform with ARIMA increases the accuracy of metal price forecast. 
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The study investigates numerous combination of mother wavelet and decomposition levels and found out that 
both variables affects forecasting accuracy. Wavelets separate the seasonal information and the detailed 
information in a time series, where the linear and non-linear property of the time series is extracted [8]. 

Multi Resolution Analysis (MRA) have been employed in forecasting using wavelets. The 
application of MRA in forecasting is as follow, firstly, the time series is decomposed to obtain smooth series 
and multiple detail series. Next, forecasting method, such as ARIMA is commonly used to extend the 
subseries. Lastly, the multiple subseries is reconstructed to the original time series [9]. In DWT, this is done 
by obtaining the summation of each of the subseries. Forecasting using MRA is capable of obtain a more 
accurate forecasting compared to directly forecasting the original series. This is because the variance of the 
subseries is more stable and they typically have no outlier [10]. 

Empirical Mode Decomposition (EMD) is another signal decomposition method that is frequently 
explored in forecasting. Several studies in wind speed forecasting and annual precipitation 
forecasting [12]-[14] have shown that forecasting the IMFs of EMD results in more accurate forecasting 
compared to direct forecasting. EMD have been shown to be able to increase forecast accuracy of Relevance 
Vector Machine (RVM) [14]. 

Jerome Gilles [15] proposed empirical wavelet transform (EWT) that identifies the many unalike 
intrinsic modes of a time series and extracts them. Compared to EMD, EWT gives a more consistent 
decomposition [15]. Since EWT is conceptually similar to DWT, it was found that using EWT in time series 
prediction increases the accuracy. EWT has been used in wind speed forecasting along with Gaussian Process 
Regression (GPR), and it was found out that the models that utilized EWT have higher accuracy compared to 
the models that does implement EWT [7], [16]. In both studies, EWT was used to denoise the input data. 

Several studies have explored on the use of clustering analysis to further improve the forecast 
accuracy [17]. In [18], IMF from EMD are clustered using k-means clustering on the instantaneous frequency 
of the IMFs. In [19], [20], Permutation Distribution Clustering (PDC) clustering is employed with EMD and 
Least Support Square Vector Machine (LSSVM) to forecast exchange rate. From the studies, 
clustering generally increases the accuracy of the forecasts. Fuzzy c-means clustering has also been 
implemented in forecasting [21]. However, currently there are no studies done on ensemble forecasting based 
on c-means clustering. 

The objective of this research paper to investigate the performance of ARIMA, EWT ARIMA and 
modified EWT-ARIMA with fuzzy c-means clustering regarding to their performance in forecasting drought 
using Standard Precipitation Index (SPI). SPI 3, SPI 6, SPI 9, SPI 12 and SPI 24 were used as indicator for 
short term and long-term drought. Mean absolute error (MAE) and root mean square error (RMSE) 
were utilized to measure the forecasting performance. ARIMA is widely used in drought forecasting, 
however it is not able to accurately forecast drought because ARIMA does not excel at non-linear data, which 
the SPI data are. 


2. RESEARCH METHOD 
2.1. Standard Precipitation Index 

The Standard Precipitation Index (SPI) was developed by McKee et al. [22] as a drought indicator 
which standards the rainfall excess/deficit on temporal and regional basis. To enable quantifying precipitation 
deficits of multiple time scales, many drought studies used SPI. These time scales show the impact of 
drought on the availability of the different water resources. Requiring only records of precipitation, 
SPI enables an analyst to determine the rarity of a drought or a non-typical wet event relative to a particular 
time scale and can be used worldwide [23]. A drought event occurs during the time when SPI is continuously 
negative. The drought event ends when the SPI turns positive. Table 1 shows the SPI based drought 
classification. 


Table 1. Drought Classification using SPI 








SPI Value Category 
Higher than 2.00 Wet (Extreme) 
1.50 until 1.99 Wet (Severe) 

1.00 until 1.49 Wet (Moderate) 
-0.99 until 0.99 Near Normal 
-1.49 until -1.00 Dry (Moderate) 

-1.99 until -1.50 Dry (Severe) 
Less than -2.00 Dry (Extreme) 
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2.1.1. ARIMA 

Scientific applications of time series model have been widespread, however, it has been limited in 
its application in drought forecasting [3]. The advantages of using time series model are their identification, 
estimation, and diagnostic check capable of systematic search for model development [24]. One of the 
commonly-used time series model is Autoregressive Integrated Moving Average (ARIMA). ARIMA is 
popular due to its statistical properties and its ability to implement various exponential smoothing 
models [25]. 

The general non-seasonal ARIMA (p,d,q) model is given by: 


by (B)V"x, = 0,(B)ar (1) 


where @(B) and 6(B)a; are polynomials of order p and q, respectively. For seasonal model, the seasonal 
ARIMA (p,d,q)(P,D,Q); Is given by: 


bp (B)Pp(BS)VIVE x; = O4(B)O9(B*)a, (2) 


where P, D, Q are the order of seasonal autoregression, number of seasonal differencing, and the order of 
seasonal MA respectively. Seasonal ARIMA models are used for time series that have cyclic features. In 
drought forecasting, the data may feature cyclic properties. 


2.2. Empirical Wavelet Transform 

The empirical wavelets can be defined as bandpass filters on each A,, where A, denotes each 
segment A, = [Wp-1,Wn] and N21 A, = [0,2]. V,> 0 TheEequation (3) and (4) follows defines the 
empirical scaling functions and empirical wavelets [7], [15] 


if iflo| < A—y)on 
on(w) = } cos |F8 (~ del — Wn + T) ) if(1-y)o, < lol <A +y)o, (3) 
0 otherwise 
1 f+ yon < lol S +y)One1 
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0 otherwise 


The function B(x) is an arbitrary C*([0,1]) function which satisfies the following properties 
_ (0ifx<0 ae 
B(x) = i; px = 08M BG) + BAX) = LV xe [0,1] 


The set {f,(t), {9,(t)}X_,} is a tight frame of L7(R) when y < min, (Geen), The inner 


nt+1it@n 
product of the signal and empirical wavelets yields the detail coefficients: 


wht) = (fon) = | FO =H ae (5) 
= f(t))(@a— 0)” 


The approximation coefficients are then calculated using the inner products and the scaling function 


wf.) = (Ftd = | FOP Hae 


= fe) (Fie 9), ” 


The reconstruction can be obtained: 
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FE) = wf (0,t) * di) + > wh.) + on(t) 


(7) 
N Vv 
= (eo w) *$,(o) +) Win, 0) * on(o)) 
n=1 
Considering this formalism, the empirical mode is given by: 
folt) = we 0, t) * 91 @) 
fie) = wy (k,t) * p,.() (8) 


2.3. EWT-ARIMA 

EWT borrows the framework from wavelet transform [6]. Thus, the application of wavelet 
transform can also be applied to EWT. Similar to wavelet decomposition, the time series is decomposed to a 
number of modes using EWT. Figure 1 shows the framework for EWT-ARIMA. The time series is 
decomposed into several modes using EWT. Numbers of studies have shown that three-level wavelet 
decomposition produces good forecasting result. Based on the papers that implemented EWT [7], [16], [26], 
the mode that is commonly used for EWT is 3. 

However, in [27], the author of EWT proposed a method to determine the number of modes for the 
decomposition. This method separates histograms into two classes, where the goal is to find the threshold T 
such that the intra-variance of the class is mimimal while the inter class variance is maximum. This study 
uses this method to determine the number of modes decomposed using EWT. The following are the process 
for performing EWT [15]: 

1. Perform pre-processing to detect peak. 

2. Perform spectrum separation based on the detected maxima. 
3. Construct corresponding wavelet filter bank. 

4. Construct the wavelets based on the wavelet filter bank. 
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Figure 1. Overall workflow for EWT-ARIMA 


2.4. Hilbert Transform 
Hilbert transform is defined as following: 





X(t) = Bp oe) dt' (9) 


t-t! 


Where p is the Cauchy principal value. The instantaneous frequency w(t) thus can be 


calculated using w(t) = ee , Where p(t) = arctan a 





2.5. Fuzzy C-means Clustering 

Fuzzy c-means(FCM) clustering is a clustering algorithm that determines the degree of membership 
of a data with regards to its cluster or class. The objective function of FCM algorithm is as follow and it is 
computed using the membership value and Euclidian distance. 
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Im(W.P) = D> Win)” dix)? 
osise (10) 
Xx 


where (dix) = |lxx — pill 
Minimizing the objective function, thus the FCM membership function is as follows [28]: 


-1 


Miy = ba (pe) (11) 


llxj-vell 4 





where j4;; is th membership value of jth sample and ith cluster, c is the number o cluster, x; is the jth 
sample and v; cluster center of the ith cluster. || ||, is the norm function. 


2.6. EWT-ARIMA Clustering Analysis 

The overall framework for the proposed method is as in Figure |. For both methods, firstly the SPI 
data is decomposed into several modes using EWT. Next, IMFs are constructed each of the modes from 
EWT. Hilbert transform are done on each of the IMFs to obtain its instantaneous frequency, where they will 
be clustered using fuzzy c-means clustering. From the obtained clusters, 3 subseries are created by summing 
the IMFs according tho the cluster information. Then, the high, medium, and low frequency series are fitted 
with ARIMA models. Forecast values are obtained for each of the 3 subseries. Lastly, for both methods, 
the forecasting results are combined to obtain the final forecasting result. Figure 2 shows the overall process 
for EWT-ARIMA Clustering analysis. 
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Figure 2. Overall workflow for modified EWT-ARIMA 


2.7. Performance Comparison 
The performance of each method is evaluated by comparing the error statistics. The error statistics 
used were MAE and RMSE. They are given by: 


MAE = ~ yy IEST; = GBS. | (12) 





RMSE = [paacesn — OBS,)? (13) 


Where EST; and OBS; denote ith estimated and observed values respectively. The lower the value of 
error, the better is the performance of the forecasting model. 


2.8. Study Area 


The physical area covered in this study is the area around Arau, Perlis. The data consist of daily 
rainfall value from 1956 to 2008. 
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Table 2. Statistical Description of Data 
Mean St. dev. Min. Max. 


Rainfall 5.459 12.598 0.000 180.300 











3. RESULTS AND ANALYSIS 

In this section, it is explained the results of research and at the same time is given the 
comprehensive discussion. Results can be presented in figures, graphs, tables and others that make the reader 
understand easily [2], [5]. The discussion can be made in several sub-chapters. 


3.1. ARIMA 

For ARIMA, there are three stages in time series model development, namely: identification, 
estimation, and diagnostic check [29]. For each of the SPI time scale, data set from 1956 to 1998 is applied in 
the estimation of the model parameters. Data from 1999 to 2008 is used to determine the accuracy of the 
forecast. SPI 3, 6, 9, 12 and 24 are used in this study. SPI is calculated from the rainfall data. Figure 3 shows 
the data consisting of SPI 3, 6 and 12. 





SPI3 
° 











1 1 1 1 1 
10) 100 200 300 400 500 600 
Time in months (1956 to 2008) 


Figure 3. SPI 3 data over from 1956 to 2008 


Figure 2 shows the autocorrelation function (ACF) and partial autocorrelation function (PACF) used 
of SPI 3. From the ACF and PACF plot, value suitable for ARIMA model obtained are p = 1,2,3,4 , 
q =0,1,2, P=1,2 , and Q =0. The combination of models suitable are obtained from the value and the 
fitness of the models determined. 
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Figure 4. ACF and PACF of SPI 3 
To select the best fit model, Akaike Information Criterion (AIC) is used. The best fit model is selected from 
the model with the lowest AIC. The mathematical formula for AIC is defined as [30]: 
AIC = —2logL+2m (14) 
where m = (p + q + P + Q) is the number of terms estimated in the model and L signifies the likelihood of 


the ARIMA models and it is monotonically decreasing function of the sum of squared residuals. The best fit 
model for SPI 3 obtained is (2,0,2). 
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Figure 5. Diagnostic Checking of ARIMA 


To verify that a model is adequate, diagnostic check is performed among the identified models. 
This includes the study of residuals to see if there are patterns unaccounted for. After fitting the model, 
the residual left over should be white noise for a model to be considered as a good model. The residual ACF 
is used to determine whether the residuals are white noise. Residuals that are largely different from zero give 
an indication that the model may be inadequate. Figure 5 shows the diagnostic check of the ARIMA model 
obtained for SPI 3. 


3.2. EWT-ARIMA 

EWT is used to decompose the SPI data into several IMF. The number of IMF depends on the data, 
where it is determined using Otsu’s method [27]. Table 3 shows the number of IMFs generated by EWT 
process. Figure 6 shows SPI 12 decomposed to 6 IMFs using EWT. ARIMA model is developed for each of 
the subseries, using auto ARIMA function found in R, and it is used to perform the forecast. The result of the 
forecast on each of the subseries is obtained, which is then used to reconstruct the original time series. 
The result of the reconstruction is the forecast result of EWT-ARIMA. 












































5 T T T T T T 
0 KS DP OAL) far YIN + 
5 L L rl L L L 
0 100 200 300 400 500 600 700 
1 T T T T T T 
Deed A A ce | a cae : 
4 L L rl L L L 
0 100 200 300 400 500 600 700 
0.5 T T T T T T 
off-set 
-0.5 : 
0 100 200 300 400 500 600 700 
0.5 T T T T T T 
0 jan anim Nf wor | 
0.5 L L I L L L 
0 100 200 300 400 500 600 700 
0.5 T T T T T T 
0 fallin petra SI Hhvnoanhomott AN Ah yn vf hl AE =| 
0.5 i i 1 1 i i 
0 100 200 300 400 500 600 700 
0.5 T T T T T T 
© Ht sll te i i ila swf i ee ee ee Aston see ieee <= 
0.5 L L rl L L L 
0 100 200 300 400 500 600 700 


Figure 6. EWT decomposition of SPI 12 
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Table 3. Number of IMFs produced by EWT 
SPI SPI3 SPI6 SPI9 SPI 12 SPI 24 
Number of IMFs 12 6 5 6 12 











3.2. EWT-ARIMA Clustering 

Fuzzy c-means clustering is used to cluster the IMF created from EWT into a certain number of 
cluster. The number of cluseter chosen is 3. Several studies have used 3 cluster [17], [18]. Figure 7 shows the 
combined into 3 wavelets based on the result of FCM clustering. IMF 1, 2 is combined to form the low 
frequency compoment. IMF 3,4,5 is combined to form the medium frequency component while IMF 6 is 
used as the high frequency component. Table 4 shows the details of the components for each cluster. From 
the components, ARIMA is used to perform forecasts. 
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Figure 7. 3 cluster from EWT decomposition on SPI 12 


Table 4. Components for Each Clusters 








SPI Cluster Components 
SPI3 Gl= {IMF1-IMF5} 
G2=  {IMF6-IMF9} 
G3=  {IMF10-IMF12} 
SPI6 Gl=  {IMF1-IMF2} 
G2= {IMF3-IMF4} 
G3=  {IMF5-IMF6} 
SPI9 Gl=  {IMFI1-IMF2} 
G2=  {IMF3-IMF4} 
G3= {IMF5} 
SPI Gl= {IMF1-IMF2} 
12 G2= {IMF3-IMF5} 
G3=  {IMF6} 
SPI Gl= {IMF1-IMF4} 
24 G2= {IMF5-IMF8} 
G3=  {IMF9-IMF12} 











3.2. Performance Evaluation 

The result of the study is obtained as in Table 5. MAE and RMSE is calculated from the forecasting 
results of each models and are compared. Table 5 shows the MAE and RMSE for testing data of the models 
on each SPI. Lower values of MAE and RMSE indicates a higher forecast accuracy. 

EWT-ARIMA performed better compared to ARIMA on all SPI. This is due to the nonlinear nature 
of SPI data, which cannot be forecasted accurately using a linear method such as ARIMA. EWT separates the 
linear and nonlinear component of the data, thus ARIMA is able to forecast them with greater accuracy. 
This show that EWT is suitable to decompose the non-linear nature of SPI data. 

The proposed model performed better compared to ARIMA and EWT-ARIMA on all SPI except 
SPI 9 and 3. This indicates that the proposed model is suitable to be used for long term drought forecasting. 
Comparing ARIMA, EWT-ARIMA and modified EWT-ARIMA, it is generally safe to assume the proposed 
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modified EWT-ARIMA performs the best, since it preformed better on 3 out of the 5 SPI tested. The result 
agrees with previous reseach that has proven that clustering on IMFs from EMD improves the forecast 
accuracy [18]-[20]. Implementation of fuzzy c-means clustering improves the forecast accuracy by grouping 
the IMFs according to its structure based on the intrinsic frequency of the IMFs. 


Table 5. MAE and RMSE of the Models 








Models ARIMA EWT-ARIMA EWT-ARIMA Clustering 
SPI MAE RMSE MAE RMSE MAE RMSE 
SPI3 0.4361 0.5619 0.3655 0.4764 0.3834 0.4982 
SPI6 0.2979 0.3973 0.2744 0.3512 0.1068 0.1397 
SPI9 0.2472 0.3250 0.1165 0.1440 0.1275 0.1621 
SPI 12 0.2230 0.2919 0.1570 0.2021 0.0703 0.0891 
SPI 24 0.1557 0.2008 0.1446 0.1857 0.0586 0.0732 





4. CONCLUSION 

This research compares ARIMA and EWT ARIMA and EWT-ARIMA clustering. SPI is used as the 
drought index and SPI 3, 6, 9, 12 and 24 was calculated using rainfall data from Arau, Perlis. ARIMA model 
was developed using Box-Jenkins methodology for each SPI. This research uses MAE and RMSE to 
compare the effectiveness of the prediction from the studied models. It was found that EWT-ARIMA 
clustering performed better compared to ARIMA and EWT-ARIMA on SPI 6, 12, and 24. EWT-ARIMA 
performed better compared to EWT-ARIMA Clustering on SPI 3 and 9. 
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